Search CORE

322 research outputs found

Consistent Estimation of Mixed Memberships with Successive Projections

Author: G Palla
N Gillis
N Gillis
T Mizutani
U Luxburg Von
Z Lu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/10/2017
Field of study

This paper considers the parameter estimation problem in Mixed Membership Stochastic Block Model (MMSB), which is a quite general instance of random graph model allowing for overlapping community structure. We present the new algorithm successive projection overlapping clustering (SPOC) which combines the ideas of spectral clustering and geometric approach for separable non-negative matrix factorization. The proposed algorithm is provably consistent under MMSB with general conditions on the parameters of the model. SPOC is also shown to perform well experimentally in comparison to other algorithms

arXiv.org e-Print Archive

Crossref

Alternative sampling for variational quantum Monte Carlo

Author: A. J. Izenman
B. V. Gnedenko
D. W. Stroock
E. Fieller
J. R. Trail
R. J. Needs
U. von Luxburg
Publication venue: 'American Physical Society (APS)'
Publication date: 30/09/2009
Field of study

Expectation values of physical quantities may accurately be obtained by the evaluation of integrals within Many-Body Quantum mechanics, and these multi-dimensional integrals may be estimated using Monte Carlo methods. In a previous publication it has been shown that for the simplest, most commonly applied strategy in continuum Quantum Monte Carlo, the random error in the resulting estimates is not well controlled. At best the Central Limit theorem is valid in its weakest form, and at worst it is invalid and replaced by an alternative Generalised Central Limit theorem and non-Normal random error. In both cases the random error is not controlled. Here we consider a new `residual sampling strategy' that reintroduces the Central Limit Theorem in its strongest form, and provides full control of the random error in estimates. Estimates of the total energy and the variance of the local energy within Variational Monte Carlo are considered in detail, and the approach presented may be generalised to expectation values of other operators, and to other variants of the Quantum Monte Carlo method.Comment: 14 pages, 9 figure

arXiv.org e-Print Archive

Crossref

Comparing spectra of graph shift operator matrices

Author: A Ortega
B Crawford
B Karrer
J Lei
K Rohe
Luca Castelli Aleardi
N Tremblay
P Mieghem van
PY Chen
S Fortunato
U Luxburg von
U Luxburg von
WW Zachary
Publication venue: Springer Cham
Publication date: 01/11/2019
Field of study

Typically network structures are represented by one of three different graph shift operator matrices: the adjacency matrix and unnormalised and normalised Laplacian matrices. To enable a sensible comparison of their spectral (eigenvalue) properties, an affine transform is first applied to one of them, which preserves eigengaps. Bounds, which depend on the minimum and maximum degree of the network, are given on the resulting eigenvalue differences. The monotonicity of the bounds and the structure of networks are related. Bounds, which again depend on the minimum and maximum degree of the network, are also given for normalised eigengap differences, used in spectral clustering. Results are illustrated on the karate dataset and a stochastic block model. If the degree extreme difference is large, different choices of graph shift operator matrix may give rise to disparate inference drawn from network analysis; contrariwise, smaller degree extreme difference results in consistent inference

Crossref

Spiral - Imperial College Digital Repository

Large Scale Spectral Clustering Using Approximate Commute Time Embedding

Author: C. Fowlkes
D. Achlioptas
D. Mavroeidis
D.A. Spielman
F. Fouss
H. Qiu
I. Koutis
L. Wang
P.G. Doyle
U. von Luxburg
W.Y. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Spectral clustering is a novel clustering method which can detect complex shapes of data clusters. However, it requires the eigen decomposition of the graph Laplacian matrix, which is proportion to

O(n^3)

and thus is not suitable for large scale systems. Recently, many methods have been proposed to accelerate the computational time of spectral clustering. These approximate methods usually involve sampling techniques by which a lot information of the original data may be lost. In this work, we propose a fast and accurate spectral clustering approach using an approximate commute time embedding, which is similar to the spectral embedding. The method does not require using any sampling technique and computing any eigenvector at all. Instead it uses random projection and a linear time solver to find the approximate embedding. The experiments in several synthetic and real datasets show that the proposed approach has better clustering quality and is faster than the state-of-the-art approximate spectral clustering methods

arXiv.org e-Print Archive

Crossref

Graph similarity through entropic manifold alignment

Author: Barrow H. G.
Cho M.
Cour T.
Duchenne O.
Edwin R. Hancock
Escolano F.
Francisco Escolano
Jiang B.
Kelsey J.
Leordeanu M.
Martins A.
Miguel A. Lozano
Nadler B.
Sanfeliu A.
von Luxburg U.
von Luxburg U.
Zass R.
Zhou F.
Zhou F.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2017
Field of study

In this paper we decouple the problem of measuring graph similarity into two sequential steps. The first step is the linearization of the quadratic assignment problem (QAP) in a low-dimensional space, given by the embedding trick. The second step is the evaluation of an information-theoretic distributional measure, which relies on deformable manifold alignment. The proposed measure is a normalized conditional entropy, which induces a positive definite kernel when symmetrized. We use bypass entropy estimation methods to compute an approximation of the normalized conditional entropy. Our approach, which is purely topological (i.e., it does not rely on node or edge attributes although it can potentially accommodate them as additional sources of information) is competitive with state-of-the-art graph matching algorithms as sources of correspondence-based graph similarity, but its complexity is linear instead of cubic (although the complexity of the similarity measure is quadratic). We also determine that the best embedding strategy for graph similarity is provided by commute time embedding, and we conjecture that this is related to its inversibility property, since the inverse of the embeddings obtained using our method can be used as a generative sampler of graph structure.The work of the first and third authors was supported by the projects TIN2012-32839 and TIN2015-69077-P of the Spanish Government. The work of the second author was supported by a Royal Society Wolfson Research Merit Award

Repositorio Institucional de la Universidad de Alicante

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

White Rose Research Online

A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks

Author: A Anandkumar
B Ball
B Karrer
E Airoldi
G Palla
J Xie
K Rohe
ME Newman
P Latouche
PW Holland
RN Shepard
S Chatterjee
S Zhang
U Luxburg Von
Y Zhao
Publication venue
Publication date: 01/01/2017
Field of study

This paper presents a novel spectral algorithm with additive clustering designed to identify overlapping communities in networks. The algorithm is based on geometric properties of the spectrum of the expected adjacency matrix in a random graph model that we call stochastic blockmodel with overlap (SBMO). An adaptive version of the algorithm, that does not require the knowledge of the number of hidden communities, is proved to be consistent under the SBMO when the degrees in the graph are (slightly more than) logarithmic. The algorithm is shown to perform well on simulated data and on real-world graphs with known overlapping communities.Comment: Journal of Theoretical Computer Science (TCS), Elsevier, A Para\^itr

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

On the Interplay between Strong Regularity and Graph Densification

Author: J Komlós
J Komlós
M Pelillo
N Alon
N Alon
T Gowers
U Luxburg von
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

In this paper we analyze the practical implications of Szemerédi’s regularity lemma in the preservation of metric information contained in large graphs. To this end, we present a heuristic algorithm to find regular partitions. Our experiments show that this method is quite robust to the natural sparsification of proximity graphs. In addition, this robustness can be enforced by graph densification

arXiv.org e-Print Archive

Archivio Ricerca Ca'Foscari

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Learning an atlas of a cognitive process in its functional geometry

Author: B. Thirion
D. Lashkari
G. Langs
G. Scott
K. Friston
L.R. Dice
M.W. Woolrich
P.K. Kuhl
R. Saxe
R.L. Buckner
R.R. Coifman
T. Elbert
U. Luxburg Von
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Proceedings of the 22nd International Conference, IPMI 2011, Kloster Irsee, Germany, July 3-8, 2011.In this paper we construct an atlas that captures functional characteristics of a cognitive process from a population of individuals. The functional connectivity is encoded in a low-dimensional embedding space derived from a diffusion process on a graph that represents correlations of fMRI time courses. The atlas is represented by a common prior distribution for the embedded fMRI signals of all subjects. The atlas is not directly coupled to the anatomical space, and can represent functional networks that are variable in their spatial distribution. We derive an algorithm for fitting this generative model to the observed data in a population. Our results in a language fMRI study demonstrate that the method identifies coherent and functionally equivalent regions across subjects.National Science Foundation (U.S.) (IIS/CRCNS 0904625)National Science Foundation (U.S.) (CAREER grant 0642971)National Institutes of Health (U.S.) (NCRR NAC P41- RR13218)National Institute of Biomedical Imaging and Bioengineering (U.S.) (U54-EB005149)National Institutes of Health (U.S.) (U41RR019703)National Institutes of Health (U.S.) (P01CA067165)Seventh Framework Programme (European Commission) (n◦257528 (KHRESMOI)

DSpace@MIT

Crossref

PubMed Central

Mathematical Analysis of Copy Number Variation in a DNA Sample Using Digital PCR on a Nanofluidic Device

Author: A Papoulis
AJ Iafrate
AS Kapadia
B Vogelstein
DJ Sheskin
E Fieller
E Fieller
H Motulsky
HH Ropers
J Sebat
Jian Qin
JR Lupski
KK Wong
M Baer
NP Carter
R Redon
R Sindelka
Ramesh Ramakrishnan
RL Scheaffer
Simant Dube
SL Emery
SL Spurgeon
U von Luxburg
U von Luxburg
Xiaolin Wu
YM Lo
Publication venue: Public Library of Science
Publication date: 06/08/2008
Field of study

Copy Number Variations (CNVs) of regions of the human genome have been associated with multiple diseases. We present an algorithm which is mathematically sound and computationally efficient to accurately analyze CNV in a DNA sample utilizing a nanofluidic device, known as the digital array. This numerical algorithm is utilized to compute copy number variation and the associated statistical confidence interval and is based on results from probability theory and statistics. We also provide formulas which can be used as close approximations

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

k is the Magic Number -- Inferring the Number of Clusters Through Nonparametric Concentration Inequalities

Author: C Bauckhage
C Böhm
CM Bishop
FRK Chung
HW Kuhn
JA Tropp
JN Kather
R Tibshirani
RA Horn
SP Lloyd
U von Luxburg
Y LeCun
Publication venue
Publication date: 04/07/2019
Field of study

Most convex and nonconvex clustering algorithms come with one crucial parameter: the

k

k

-means. To this day, there is not one generally accepted way to accurately determine this parameter. Popular methods are simple yet theoretically unfounded, such as searching for an elbow in the curve of a given cost measure. In contrast, statistically founded methods often make strict assumptions over the data distribution or come with their own optimization scheme for the clustering objective. This limits either the set of applicable datasets or clustering algorithms. In this paper, we strive to determine the number of clusters by answering a simple question: given two clusters, is it likely that they jointly stem from a single distribution? To this end, we propose a bound on the probability that two clusters originate from the distribution of the unified cluster, specified only by the sample mean and variance. Our method is applicable as a simple wrapper to the result of any clustering method minimizing the objective of

k

-means, which includes Gaussian mixtures and Spectral Clustering. We focus in our experimental evaluation on an application for nonconvex clustering and demonstrate the suitability of our theoretical results. Our \textsc{SpecialK} clustering algorithm automatically determines the appropriate value for

k

, without requiring any data transformation or projection, and without assumptions on the data distribution. Additionally, it is capable to decide that the data consists of only a single cluster, which many existing algorithms cannot

arXiv.org e-Print Archive

Crossref

Pure OAI Repository